Week 8: Adding Features

library(spotifyr)
library(compmus)
library(tidyverse)
library(circlize)
library(DT)

This Week

  • Sign up for presentations (see the link on Canvas)
  • Remaining Important Dates:
    • Today: Adding more features and discussing final project a bit.
    • Wednesday: Timbre and Form
    • Friday: First draft due
      • This is a complete (roughly 10-12 page) draft.
    • Next Monday and Tuesday: Individual meetings (first draft.)
    • Next Wednesday: Presentation Day 1
    • Monday (Week 10): No class; memorial day
    • Wednesday (Week 10): Presentation Day 2

Already Available Features

Global features of interest

Metadata we’ve been using

  • artist_name
  • album_release_date
  • album_release_year
  • album_release_date_precision
  • available_markets
  • track_name
  • album_name

Continuous Variables

  • danceability
  • energy
  • loudness
  • speechiness
  • acousticness
  • instrumentalness
  • liveness
  • valence
  • tempo
  • duration_ms
  • key_confidence
  • mode_confidence
  • time_signature_confidence
  • tempo_confidence
  • start_of_fadeout
  • end_of_fadeout
  • duration

Continuous Variables from Lyrics

  • TF-IDF
  • Sentiment analysis ()

Categorical Variables

  • mode
  • explicit
  • key
  • key_name
  • mode_name
  • key_mode
  • time_signature

Additional Features We Might Explore

  • Relationship to the broader key profile
  • Transition probabilities
  • Timbral markers

Relationship to the Broader Key Profile

One way of exploring a piece is by looking at how it fits within a broader key profile. For example, if we have one key profile taken from a large collection, how does a specific piece relate to that collection in terms of pitch content?

Here, we can start by getting a key profile of a playlist.

grab_playlist_info <- function(uri){
   get_playlist_audio_features("", uri) |>
   add_audio_analysis() 
}
playlist <- grab_playlist_info("37i9dQZF1DX1kCIzMYtzum")  

Then we can grab chroma and pitches with code from earlier in the quarter (provided by Burgoyne examples):

get_pitch_list <- function(input){
   ##burgoyne's comp_mus code for gathering key profiles from chroma.
   input |>     
   mutate(segments = map2(segments, key, compmus_c_transpose)) |>
   select(segments) |>
   unnest(segments) |> 
   select(start, duration, pitches) |> 
   mutate(pitches = map(pitches, compmus_normalise, "euclidean")) |>
   compmus_gather_chroma() |>
   group_by(pitch_class) |>
   summarise(mean_value = mean(value))
}

Then we just need to grab each list, and provide a pitch correlation (here I’ve used a loop, which might not be the most efficient way to do it in R).

pitch_list <- get_pitch_list(playlist)
playlist$pitch_cor <- NA
for(i in 1:nrow(playlist)){
    pitch <- get_pitch_list(playlist[i,])
    playlist$pitch_cor[i] <- cor(pitch$mean_value, pitch_list$mean_value)
}

Exercise

  1. Can you grab a collection, and then look at how each piece in that collection relates to the broader key profile?

Transition Probabilities

We could also grab transition probabilities from note to note. Here we use previously used code to get chroma that go from one to another.

chroma_names <- c("C", "C#|Db","D", "D#|Eb", "E", "F", "F#|Gb","G", "G#|Ab","A", "A#|Bb","B" )


x <- playlist |>  
    mutate(segments = map2(segments, key, compmus_c_transpose)) |>
    select(segments) |>
    unnest(segments) |>
    select(start, duration, pitches) |>
    unnest(cols = pitches)
x$chroma <- rep(chroma_names, nrow(x)/12)
x <- x |>
  filter(pitches == 1) |>
  mutate(chroma2 = lead(chroma))
x |> select(chroma, chroma2) |> table() |> heatmap(Rowv = NA,
        Colv = NA)

We might also want to run it as proportions, rather than raw counts:

pairs <-  x |> select(chroma, chroma2) |> table()
prop.table(pairs) |> heatmap(Rowv = NA,
        Colv = NA)

We can convert this data to rows and columns like this, and can then move toward adding it to the dataset.

grab_pitch_pairs <- function(input){
    x <- input |>  
    mutate(segments = map2(segments, key, compmus_c_transpose)) |>
    select(segments) |>
    unnest(segments) |>
    select(start, duration, pitches) |>
    unnest(cols = pitches)

    x$chroma <- rep(chroma_names, nrow(x)/12)
    x <- x |>
      filter(pitches == 1) |>
      mutate(chroma2 = lead(chroma))
    pair_proportion <- prop.table(pairs)
    pair_proportion <- as.matrix(pair_proportion)

    # melt the data.frame
    df <- reshape2::melt(pair_proportion, na.rm = TRUE)
    df$combined <- paste0(df$chroma,"-",df$chroma2)
    df$combined <- as.factor(df$combined)
    df <- as_tibble(df)
    y <- df |> select(value, combined)
    print(y)
}

This is how we’d get the transitions from each pitch:

n <- grab_pitch_pairs(playlist) 
# A tibble: 144 × 2
     value combined
     <dbl> <fct>   
 1 0.0186  A-A     
 2 0.00363 A#|Bb-A 
 3 0.00313 B-A     
 4 0.00870 C-A     
 5 0.00122 C#|Db-A 
 6 0.00358 D-A     
 7 0.00220 D#|Eb-A 
 8 0.00396 E-A     
 9 0.00342 F-A     
10 0.00174 F#|Gb-A 
# ℹ 134 more rows

And we can pivot it to a table format with pivot_wide.

n |> pivot_wider(names_from = combined, values_from = value)
# A tibble: 1 × 144
   `A-A` `A#|Bb-A`   `B-A`   `C-A` `C#|Db-A`   `D-A` `D#|Eb-A`   `E-A`   `F-A`
   <dbl>     <dbl>   <dbl>   <dbl>     <dbl>   <dbl>     <dbl>   <dbl>   <dbl>
1 0.0186   0.00363 0.00313 0.00870   0.00122 0.00358   0.00220 0.00396 0.00342
# ℹ 135 more variables: `F#|Gb-A` <dbl>, `G-A` <dbl>, `G#|Ab-A` <dbl>,
#   `A-A#|Bb` <dbl>, `A#|Bb-A#|Bb` <dbl>, `B-A#|Bb` <dbl>, `C-A#|Bb` <dbl>,
#   `C#|Db-A#|Bb` <dbl>, `D-A#|Bb` <dbl>, `D#|Eb-A#|Bb` <dbl>, `E-A#|Bb` <dbl>,
#   `F-A#|Bb` <dbl>, `F#|Gb-A#|Bb` <dbl>, `G-A#|Bb` <dbl>, `G#|Ab-A#|Bb` <dbl>,
#   `A-B` <dbl>, `A#|Bb-B` <dbl>, `B-B` <dbl>, `C-B` <dbl>, `C#|Db-B` <dbl>,
#   `D-B` <dbl>, `D#|Eb-B` <dbl>, `E-B` <dbl>, `F-B` <dbl>, `F#|Gb-B` <dbl>,
#   `G-B` <dbl>, `G#|Ab-B` <dbl>, `A-C` <dbl>, `A#|Bb-C` <dbl>, `B-C` <dbl>, …

We can put all of this together like so (using the playlist variable from before.)

chroma_names <- c("C", "C#|Db","D", "D#|Eb", "E", "F", "F#|Gb","G", "G#|Ab","A", "A#|Bb","B" )


x <- playlist |>  
  mutate(segments = map2(segments, key, compmus_c_transpose)) |>
  select(segments, track.name) |>
  unnest(segments) |>
  select(track.name, start, duration, pitches) |>
  unnest(cols = pitches)


x$chroma <- rep(chroma_names, nrow(x)/12)

x <- x |>
  filter(pitches == 1) |>
  mutate(chroma2 = lead(chroma))  |>
  select(track.name, chroma, chroma2)


new_df <- x |>
  group_by(track.name) |>
  select(-track.name) |>
  table() |>
  prop.table() |>
  data.frame() |>
  tibble() |>
  mutate(bigram = paste(chroma, "to ", chroma2)) |>
  select(track.name, Freq, bigram) |>
  pivot_wider(names_from = bigram, values_from = Freq)
Adding missing grouping variables: `track.name`
df <- cbind(playlist, new_df)

We can display this beast of a table like so.

df |> datatable(filter = "top")
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

We can also use the map tool for adding means and standard deviations of other nested information from the audio analysis.

Adding a “Bar Confidence” Metric with Map

playlist_w_bars <- playlist |> 
  mutate(
    distance_btwn_bars = map_dbl(playlist$bars, ~mean(.x$confidence)),
    bar_flex = map_dbl(playlist$bars, ~sd(.x$confidence)))

playlist_w_bars$distance_btwn_bars
  [1] 0.4551875 0.5563667 0.5110440 0.4706951 0.4678807 0.4994444 0.5448243
  [8] 0.4741267 0.5224156 0.4981391 0.4973795 0.5483256 0.4481204 0.4867345
 [15] 0.4730292 0.5187763 0.4374865 0.5083252 0.4761505 0.5101348 0.5567176
 [22] 0.4663061 0.5225542 0.4804079 0.4510556 0.4763651 0.4767732 0.5702603
 [29] 0.4303371 0.5253263 0.5291899 0.5382901 0.6204366 0.5818095 0.4324605
 [36] 0.4884941 0.4298987 0.6130645 0.4887949 0.5684205 0.4799510 0.5596832
 [43] 0.5918621 0.5266949 0.4868889 0.4885500 0.4864300 0.4501781 0.5043301
 [50] 0.4752222 0.5283333 0.5929844 0.4114211 0.5364444 0.4656316 0.5890886
 [57] 0.5117245 0.4882697 0.4163828 0.5039880 0.4435595 0.4781609 0.4321143
 [64] 0.5435684 0.4924434 0.4906190 0.4716875 0.5467568 0.4712970 0.4896944
 [71] 0.4786147 0.5670588 0.4750476 0.5287412 0.4958068 0.4727803 0.5707126
 [78] 0.4611026 0.5301313 0.4986389 0.5179775 0.4554699 0.4903784 0.4640674
 [85] 0.4507802 0.4694762 0.5243611 0.4647125 0.4947347 0.5224038 0.4603854
 [92] 0.4898852 0.4892844 0.4572055 0.5334701 0.4332840 0.5115952 0.4767763
 [99] 0.5487789 0.4659010
x <- playlist |>  
    mutate(segments = map2(segments, key, compmus_c_transpose)) |>
    select(segments) |>
    unnest(segments) |>
    select(start, duration, pitches) |>
    unnest(cols = pitches)
x$chroma <- rep(chroma_names, nrow(x)/12)
x <- x |>
  filter(pitches == 1) |>
  mutate(chroma2 = lead(chroma)) 
x
# A tibble: 76,021 × 5
   start duration pitches chroma chroma2
   <dbl>    <dbl>   <dbl> <chr>  <chr>  
 1 0       0.127        1 F#|Gb  F      
 2 0.127   0.146        1 F      F      
 3 0.273   0.145        1 F      F#|Gb  
 4 0.418   0.123        1 F#|Gb  F      
 5 0.541   0.0983       1 F      D      
 6 0.639   0.296        1 D      C      
 7 0.935   0.231        1 C      E      
 8 1.17    0.225        1 E      C      
 9 1.39    0.247        1 C      D      
10 1.64    0.233        1 D      C      
# ℹ 76,011 more rows

Getting timbre

Timbre is measured in Spotify with cepstra. This measurement was derived in speech analysis (and is a re-arrangement of the word spectrum-singular: cepstrum). An excellent overview can be found here.

The Spotify API writes that it is a “vector that includes 12 unbounded values roughly centered around 0. Those values are high level abstractions of the spectral surface, ordered by degree of importance.”

The first dimension is an average loudness, the second is about “brightness”, the third is about “flatness”, and the fourth through the twelfth roughly correspond to the strength of the attack.

Spotify’s Timbre Functions

Timbre for “This is America”

this_is_america <-
  get_tidy_audio_analysis("0b9oOr2ZgvyQu88wzixux9") |>  
  compmus_align(bars, segments) |> 
  select(bars) |>                                     
  unnest(bars) |>                                     
  mutate(
    pitches =
      map(segments,
        compmus_summarise, pitches,
        method = "mean", norm = "euclidean"              
      )
  ) |>
  mutate(
    timbre =
      map(segments,
        compmus_summarise, timbre,
        method = "mean", norm = "euclidean"            
      )
  )

Here, we can use the compmus_gather_timbre function from compmus. Here we see the distribution of cepstra in “This is America”.

this_is_america |>
  compmus_gather_timbre() |> 
    ggplot(aes(y=value, x=basis)) + 
    geom_violin(position="dodge", alpha=0.5) +
    theme_bw()

Similar to a chromagram, we can plot the https://en.wikipedia.org/wiki/Mel-frequency_cepstrum to demonstrate changing timbre throughout the piece.

this_is_america |>
  compmus_gather_timbre() |>
  ggplot(
    aes(
      x = start + duration / 2,
      width = duration,
      y = basis,
      fill = value
    )
  ) +
  geom_tile() +
  labs(x = "Time (s)", y = NULL, fill = "Magnitude") +
  scale_fill_viridis_c() +                              
  theme_classic()

Comparing Solo Instrument Pieces

Let’s compare a solo trumpet (BWV 1067, orchestral suite no.2) and a flute

bwv1067_trumpet <-
  get_tidy_audio_analysis("6Tv19wcEeyvNBmhRGY59bY") |>  
  compmus_align(bars, segments) |> 
  select(bars) |>                                     
  unnest(bars) |>                                     
  mutate(
    pitches =
      map(segments,
        compmus_summarise, pitches,
        method = "mean", norm = "euclidean"              
      )
  ) |>
  mutate(
    timbre =
      map(segments,
        compmus_summarise, timbre,
        method = "mean", norm = "euclidean"            
      )
  )

bwv1067_trumpet |>
  compmus_gather_timbre() |>
  ggplot(
    aes(
      x = start + duration / 2,
      width = duration,
      y = basis,
      fill = value
    )
  ) +
  geom_tile() +
  labs(x = "Time (s)", y = NULL, fill = "Magnitude") +
  scale_fill_viridis_c() +                              
  theme_classic()

and flute (this recording).

bwv1067_flute <-
  get_tidy_audio_analysis("2Ej8j8vN0hlRulT2DJKu52") |>  
  compmus_align(bars, segments) |> 
  select(bars) |>                                     
  unnest(bars) |>                                     
  mutate(
    pitches =
      map(segments,
        compmus_summarise, pitches,
        method = "mean", norm = "euclidean"              
      )
  ) |>
  mutate(
    timbre =
      map(segments,
        compmus_summarise, timbre,
        method = "mean", norm = "euclidean"            
      )
  )

bwv1067_flute |>
  compmus_gather_timbre() |>
  ggplot(
    aes(
      x = start + duration / 2,
      width = duration,
      y = basis,
      fill = value
    )
  ) +
  geom_tile() +
  labs(x = "Time (s)", y = NULL, fill = "Magnitude") +
  scale_fill_viridis_c() +                              
  theme_classic()

Exercise:

How might we incorporate timbre in our own research questions?

I have a theory that tempo and brightness are related in our playlist. Let’s see if they’re related.

timbre <- playlist |>  
  compmus_align(bars, segments) |> 
  select(track.name, bars) |>                                     
  unnest(bars) |>                                     
  mutate(
    pitches =
      map(segments,
        compmus_summarise, pitches,
        method = "mean", norm = "euclidean"              
      )
  ) |>
  mutate(
    timbre =
      map(segments,
        compmus_summarise, timbre,
        method = "mean", norm = "euclidean"            
      )
  )

timbre_coeffs <- 
  timbre |>
  compmus_gather_timbre() |> 
  select(track.name, basis, value) |> 
  group_by(basis, track.name) |> 
  mutate(mean_timbre = mean(value)) |> 
  select(track.name, mean_timbre) |> 
  unique() |> 
  pivot_wider(names_from = basis, values_from = mean_timbre)
Adding missing grouping variables: `basis`
new_playlist <- merge(timbre_coeffs, playlist)

new_playlist |> datatable(filter="top")
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

I now have a dataframe that includes timbre. So let’s look at how brightness (here operationalized as c02), might correspond with tempo.

cor(new_playlist$c02, new_playlist$tempo)
[1] 0.2465019

It’s not a terribly strong correlation, but perhaps we should plot it anyway.

plot(c02 ~ tempo, data=new_playlist)
abline(lm(c02 ~ tempo, data=new_playlist), col="red")

summary(lm(c02 ~ tempo, data=new_playlist))

Call:
lm(formula = c02 ~ tempo, data = new_playlist)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.76475 -0.09058  0.02577  0.13441  0.28620 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.187786   0.191921  -0.978   0.3303  
tempo        0.003732   0.001482   2.518   0.0134 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1783 on 98 degrees of freedom
Multiple R-squared:  0.06076,   Adjusted R-squared:  0.05118 
F-statistic:  6.34 on 1 and 98 DF,  p-value: 0.01342

So it’s significant, but not terribly useful (not accounting for much variance).

Self-Similarity Matrices and Form

We can look at musical form through the use of self-similarity matrices. There are some nice technical explanations of them here and here.

Put succinctly, we want to compare each element of the sequence of musical events with one another.

Müller writes:

The two most prominent structures in SSMs […] are referred to as blocks and paths. If the feature sequence captures musical properties that stay somewhat constant over the duration of an entire musical part, each of the feature vectors is similar to all other feature vectors within this segment. As a result, an entire block of large values appears in the SSM. In other words, homogeneity properties correspond to block-like structures. If the feature sequence contains two repeating subsequences (e.g., two segments corresponding to the same melody), the corresponding elements of the two subsequences are similar to each other. As a result, a path (or stripe) of high similarity running parallel to the main diagonal becomes visible in the SSM. In other words, repetitive properties correspond to path-like structures. (from this notebook)

From Müller

Brahms’s “Hungarian Dance No. 5” (performed by Isaac Stern)

brahms_stern <-
  get_tidy_audio_analysis("1PKtuxuLUbXeJNa05bfAOT")  |> 
  compmus_align(bars, segments) |>                     
  select(bars) |>                                      
  unnest(bars) |>                                      
  mutate(
    pitches =
      map(segments,
        compmus_summarise, pitches,
        method = "rms", norm = "manhattan"              
      )
  ) |>
  mutate(
    timbre =
      map(segments,
        compmus_summarise, timbre,
        method = "rms", norm = "manhattan"              
      )
  )

brahms_stern |>
  compmus_self_similarity(timbre, "cosine") |> 
  ggplot(
    aes(
      x = xstart + xduration / 2,
      width = xduration,
      y = ystart + yduration / 2,
      height = yduration,
      fill = d
    )
  ) +
  geom_tile() +
  coord_fixed() +
  scale_fill_viridis_c(guide = "none") +
  theme_classic() +
  labs(x = "", y = "")

Brahms’s “Hungarian Dance No. 5” (Abbado)

brahms_abbado <-
  get_tidy_audio_analysis("02TadnJNMcVjr4baY39H1p")  |> 
  compmus_align(bars, segments) |>                     
  select(bars) |>                                      
  unnest(bars) |>                                      
  mutate(
    pitches =
      map(segments,
        compmus_summarise, pitches,
        method = "rms", norm = "euclidean"              
      )
  ) |>
  mutate(
    timbre =
      map(segments,
        compmus_summarise, timbre,
        method = "rms", norm = "euclidean"              
      )
  )

brahms_abbado |>
  compmus_self_similarity(timbre, "cosine") |> 
  ggplot(
    aes(
      x = xstart + xduration / 2,
      width = xduration,
      y = ystart + yduration / 2,
      height = yduration,
      fill = d
    )
  ) +
  geom_tile() +
  coord_fixed() +
  scale_fill_viridis_c(guide = "none") +
  theme_classic() +
  labs(x = "", y = "")

Bowie’s Life on Mars

Here can we see a self-similarity matrix of David Bowie’s “Life on Mars”. Let’s listen along to it.

life_on_mars <-
  get_tidy_audio_analysis("3ZE3wv8V3w2T2f7nOCjV0N")  |> 
  compmus_align(bars, segments) |>                     
  select(bars) |>                                      
  unnest(bars) |>                                      
  mutate(
    pitches =
      map(segments,
        compmus_summarise, pitches,
        method = "rms", norm = "euclidean"              
      )
  ) |>
  mutate(
    timbre =
      map(segments,
        compmus_summarise, timbre,
        method = "rms", norm = "euclidean"              
      )
  )
life_on_mars |>
  compmus_self_similarity(timbre, "cosine") |> 
  ggplot(
    aes(
      x = xstart + xduration / 2,
      width = xduration,
      y = ystart + yduration / 2,
      height = yduration,
      fill = d
    )
  ) +
  geom_tile() +
  coord_fixed() +
  scale_fill_viridis_c(guide = "none") +
  theme_classic() +
  labs(x = "", y = "")

Exercise:

Let’s look at two performances of the same piece. How do timbres change? Are there any hypotheses that might be worth looking into?